A method for partitioning trends in genetic mean and variance to understand breeding practices

Background In breeding programmes, the observed genetic change is a sum of the contributions of different selection paths represented by groups of individuals. Quantifying these sources of genetic change is essential for identifying the key breeding actions and optimizing breeding programmes. However, it is difficult to disentangle the contribution of individual paths due to the inherent complexity of breeding programmes. Here we extend the previously developed method for partitioning genetic mean by paths of selection to work both with the mean and variance of breeding values. Methods First, we extended the partitioning method to quantify the contribution of different paths to genetic variance assuming that the breeding values are known. Second, we combined the partitioning method with the Markov Chain Monte Carlo approach to draw samples from the posterior distribution of breeding values and use these samples for computing the point and interval estimates of partitions for the genetic mean and variance. We implemented the method in the R package AlphaPart. We demonstrated the method with a simulated cattle breeding programme. Results We show how to quantify the contribution of different groups of individuals to genetic mean and variance and that the contributions of different selection paths to genetic variance are not necessarily independent. Finally, we observed that the partitioning method under the pedigree-based model has some limitations, which suggests the need for a genomic extension. Conclusions We presented a partitioning method to quantify sources of change in genetic mean and variance in breeding programmes. The method can help breeders and researchers understand the dynamics in genetic mean and variance in a breeding programme. The developed method for partitioning genetic mean and variance is a powerful method for understanding how different selection paths interact within a breeding programme and how they can be optimised. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-023-00804-3.

variation, generation interval, and dissemination. To quantify the contributions, García-Cortés et al. [1] developed a method for analysing the change in the genetic mean by partitioning the breeding values into the contributions of several paths. The method uses the standard partitioning of an individual's breeding value a k into parent breeding values 1 2 a f (k) and 1 2 a m(k) and a Mendelian sampling term w k .
Furthermore, the method assigns parent breeding values and Mendelian sampling terms to analyst-defined paths, such as sex, origin, selection path, etc. By aggregating these partitions by other variables, such as year, the method summarises the contributions of different groups of individuals to the overall genetic trend. This approach has been used to quantify the contributions of different countries to the overall genetic trend in the global Brown Swiss population [2], global and local Holstein populations [2], and Croatian Simmental cattle [3], Croatian Landrace, and Large-White pigs [4]. More recently, the method was used to estimate the starting point of adopting genomic selection by quantifying differences in genetic trends estimated with pedigree-based and singlestep genomic best linear unbiased prediction (BLUP) [5].
In addition to the contribution of paths to changes in genetic mean, breeding programmes should also consider analysing changes in genetic variance to fully understand the source of genetic change in a population [6,7]. Furthermore, managing the change in genetic mean and variance in breeding programmes is essential to ensure a long-term genetic gain [8,9]. Therefore, we must quantify the contribution of different selection paths in a breeding programme to the genetic mean and variance. For example, in several economically important species, male selection and dissemination represent a crucial lever that has the largest impact on a population's genetic mean and variance.
The aim of this paper is to extend the method of García-Cortés et al. [1] to (i) partition the genetic mean and variance, (ii) implement the method in AlphaPart R package, and (iii) apply the partitioning method to estimated breeding values following the work of [6] and [7]. We used simulation to demonstrate the methodology and provide insights on how to use the AlphaPart R package [10] to analyse real data.

Partitioning theory
In this section, we delve into the theory of partitioning breeding values and the computation of their mean and variance.
Let a be a vector of breeding values following a normal distribution with mean 0 and pedigree-based covariance Aσ 2 a . Then, we can write a as a linear combination of the individual's ancestor breeding values and the individual's deviation from the ancestors a = Tw, where T is a lower-triangular matrix of expected gene flow between ancestors and individuals following a pedigree, and w ∼ N 0, Wσ 2 a are Mendelian sampling terms representing the deviations, with W being a diagonal matrix of variance coefficients and σ 2 a the base population (additive) genetic variance [11][12][13][14].
Assuming a factor with p levels, representing our paths of interest, and for any set p j=1 P j = I , García-Cortés et al. [1] partitioned the gene flow matrix into contributions of each path by defining T j = TP j , j = 1, 2, . . . , p , and further partitioned the contribution of each path to breeding values a priori using the equality: García-Cortés et al. [1] further showed that these contributions can be estimated from data collected in breeding programmes (a posteriori). They first calculated the conditional expectation of breeding values given phenotype data ( y ), EBV = a = E a|y . Then they plugged the estimated breeding value (EBV), represented by a , and estimated Mendelian sampling terms ( w) into Eq. (1). This approach enabled them to estimate the conditional expectation of partitions, i.e., a j = E a j |y : By summarising the breeding value partitions over time, García-Cortés et al. [1] quantified the contribution of each path (for example, males vs females, different countries, etc.) to genetic mean over time: µ a t with t = 1, 2, . . . , m . Technically this is achieved by sub-setting the a 1 , a 2 , . . . , a p and averaging each subset to obtain µ a j t where p j=1 µ a j t = µ a t . This method has been implemented in the AlphaPart R package [10,15]. The AlphaPart R package efficiently calculates the partitions by leveraging the sparse T −1 [11][12][13][14], and enables a straightforward summary by one variable, such as year, or combination of variables (interaction), such as year and sex. We refer to this variable as x * t , with t = 1, 2, . . . , m , and m representing the number of distinct categories. Importantly, AlphaPart enables the use of any function to summarise the partitions of breeding values, i.e., f a j .
To enable the use of variance as one of the summary functions in AlphaPart, we extend the partitioning method to analyse the contribution of paths to genetic variance. Variance of breeding values is, a priori, (1) a = T 1 + T 2 + · · · + T p w = a 1 + a 2 + · · · + a p .
Var(a) = Var(Tw) = TWT ⊤ σ 2 a . Using Eq. (1), we can further partition the genetic variance by paths as: Note that A j and A jj ′ are different from the regular numerator relationship matrix A ; for example, some diagonals in A j and A j,j ′ have zero values. Note also that this partitioning of the genetic variance is similar to the multi-breed partitioning of the genetic variance [16]-we parameterise the model with one base population genetic variance. In contrast, García-Cortês and Toro [16] parameterised the model with multiple base population genetic variances and covariances.
While this partitioning by paths may involve dense matrices such as A j and A j,j ′ , we can efficiently calculate the partitions a 1 , a 2 , . . . , a p by working with the sparse T −1 [11][12][13][14]17]. Again variable x * t with m distinct categories, t = 1, 2, . . . , m , is used to summarise the paths. Thus, we can define the genetic variance for the partition j given category t that has n k ≤ nI individuals, k * = 1, 2, . . . , n k , as: where a j t is a column for partition j, but only considering individuals in category t, n k is the number of individuals in category t, and µ a j t = 1 n k n k k * =1 a j t ,k * . Similarly, the genetic covariance between the partitions j and j ′ , j = j ′ , given category t is then: Note that the formulation of variance Eq. (4) and covariance Eq. (5) are similar to the definition in [6] but applied to breeding value partitions. By sub-setting the partitions Var = σ a j t ,a j ′ t . a 1 , a 2 , . . . , a p by the variable x * t , such as year, we can calculate Eqs. (4) and (5) for each category.
It is worth noting that there is a difference between Eq. (3) and Eq. (4) or (5). The σ 2 a in Eq. (3) represents the base population genetic variance, while the expression a describes how the variance changes through a given pedigree and how it partitions by paths. Equations (4) and (5) represent the variance and covariance of breeding value partitions a 1 , a 2 , . . . , a p , that contribute to the total genetic variance but calculated just for individuals in the category t. Therefore, we can partition a population's genetic variance into path contributions, which can be summarised in the same ways as genetic mean [1,18]. Such analyses can quantify the contribution of different paths to changes in genetic mean and variance over time, µ a t and σ 2 a t . For example, to quantify how selection paths by sexes contribute to changes in genetic mean and variance in a breeding programme, as shown in the "Results" section, or to quantify the contribution of different countries (when importing), artificial insemination centres, or breeders.
The presented partitioning of genetic variance holds for true breeding values. However, when EBV are available, we cannot substitute a 1 , a 2 , . . . , a p with their expectations a 1 , a 2 , . . . , a p in Eqs. (4) and (5) as García-Cortés et al. [1] could do it for the partitioning of the genetic mean. To see this, imagine a situation where EBV are based on very limited phenotype information. Such EBV will be shrunken strongly towards zero and will have a low accuracy [14]. As such, these EBV will not be a good representation of true breeding values, and their variance, Var( EBV) = Var E a|y will be much smaller than the variance of breeding values σ 2 a and its time trajectory σ 2 a t . To address this issue, we use the approach from Sorensen et al. [6], and Lara et al. [7] that involves three steps. First, sample breeding values from their posterior distribution [17]. Second, for every sample of breeding values, calculate desired quantities. In our case, the desired quantities are mean and variance of breeding values over time: µ a t and σ 2 a t ; breeding value partitions: a 1 , a 2 , . . . a p ; and mean, variance, and covariance of the partitions over time: µ a j t , σ 2 and σ a j t ,a j ′ t . Multiple samples of these quantities represent their posterior distributions: p µ a t |y , p σ 2 a t |y , p a 1 , a 2 , . . . a p |y , p µ a j t |y , p σ 2 a j t |y , and p σ a j t ,a j ′ t |y .
Third, summarise the samples to describe the posterior distributions of interest.

Statistical model and computational approaches
In the previous subsection, we assumed that the true breeding values were known. However, in reality, we infer breeding values from phenotype data. To this end, we fitted the standard pedigree-based model to data described in the "Simulation" section: where y is a vector of observed phenotypes, b is a vector of fixed effects with the design matrix X , a is a vector of breeding values with the design matrix Z , σ 2 e is a residual variance, A is pedigree-based relationship matrix and σ 2 a is genetic variance in the base population. Additional file 1: Figs. S1 and S2 provide more information about the model definition.
We sampled from the posterior distribution of all model parameters with the Gibbs algorithm (a Markov Chain Monte Carlo (MCMC) method) as implemented in [19]. First, we constructed one chain with 80,000 samples, of which 20,000 were considered burn-in, while the remaining 60,000 were stored and thinned by saving every 40-th sample. Then, we assessed the burn-in convergence by inspecting the trace and auto-correlation plots. Consequently, 1500 samples of breeding values were stored, representing the posterior distribution p a|y . These samples were passed as input to the Alp-haPart R package.
It is imperative to note that the proposed partitioning method requires samples from the posterior distribution p a|y to enable inference of the path contributions to both genetic mean and variance. While we have used the full Bayesian approach with MCMC [17], an alternative is to use the empirical Bayesian approach; estimating variance components with restricted maximum likelihood (REML) and sampling breeding values assuming that variance components are known [7,17]. The full Bayesian approach is recommended to account for uncertainty in estimating all model parameters.

Frequentist measures of model fit and agreement
The partitioning methodology depends on well-calibrated estimates of breeding values. If the used model (6) does not adequately describe the data, estimates of a and derived quantities might be miss-calibrated [20]. Working with simulation, we have the benefit of knowing the true breeding value of individuals ( a ) and can hence evaluate how our estimates of breeding values are calibrated.
First, we evaluated the agreement between true and estimated mean and variance of breeding values over generations using the concordance correlation coefficient defined by [21]. Let t be the index for the generation with t = 1, 2, . . . , m . Recall the mean and variance of true breeding values at generation t respectively as µ a t and σ 2 a t . Moreover, let a = E a|y be the vector of posterior means of individual breeding values in p a|y , and E a t and Var a t , respectively, be the mean and variance of these posterior means at generation t. We then evaluated the agreement between the variables Y * Var Assuming that the pairs of Y * 1 t , Y * 2 t are independent draws from a bi-variate population with means µ 1 and µ 2 and a covariance matrix: we can evaluate the agreement between Y * 1 t and Y * 2 t with the concordance correlation coefficient [21]. This coefficient lies between −1 and 1, and is given by: where ρ is the Pearson correlation coefficient, and C b is the bias correction factor. Here, ρ measures how far each observation deviates from the best-fit line, and C b ∈ [0, 1] measures how far the best-fit line deviates from the identity line y = x and is defined as is a scale shift and u = (µ 1 − µ 2 )/ √ σ 1 σ 2 is a location shift relative to the scale. When C b = 1 , there is no deviation from the identity line, consequently, the quantity of interest is close to the 'truth' . We also used root mean square deviation (RMSD) to measure the bias between Y * 2 t and Y * 1 t , which is given by: We also evaluated the distribution of the difference between true and estimated quantities of interest. We show this evaluation for mean and variance of breeding value partitions over various categories (sex and generation in the example described in the following). Let n r be the number of simulation replicates, r = 1, 2, . . . , n r and a j t ,r the partition of breeding values for the path j category of individuals t in replicate r. We obtained the posterior distribution of our quantities of interest for the partitions and categories in each replicate: p µ a j t ,r |y r , p σ 2 a j t ,r |y r , and p σ a j t ,r , a j ′ t ,r |y r , summarised these posterior distributions with the posterior mean, and calculated the difference between this posterior mean and the corresponding true value, for example, µ a j t ,r − E µ a j t ,r |y r . With a good model fit, we expect that the difference is centred around zero.

AlphaPart implementation
The partitioning method is implemented in the Alp-haPart R package [10,18]. The main input for the analysis is a data frame (data) with: • pedigree information for individual (id), sire (Fid) and dam (Mid); • partition variable (path)-colPath; • breeding values for one or multiple traits-colBV; • grouping variable ( x * t ) used to compute the conditional expectations such as generation, birth year, location, etc.
We partition the breeding values (BV) by paths using: We summarise the partitions using the grouping variable (time) using: where object is an object of class AlphaPart with breeding value partitions, by represents the column by which summary function FUN is applied. For this work, we included an extra argument cov that controls how the covariances are displayed in the output. If cov = FALSE, the default, all covariances are returned in a single column as 2 p−1 j=1 p j ′ =j+1 Cov(a j , a j ′ ) , otherwise, if cov = TRUE, the summary method returns p(p − 1)/2 columns, where each column represents covariances as 2Cov(a j , a j ′ ).
We further describe how to use the posterior samples of breeding values from the "Statistical model and computational approaches" section in AlphaPart. Let T be the number of traits and S be the number of samples of breeding values. Suppose data is a data frame containing columns for individual (id), father (Fid), mother (Mid), path (path), and generation (Gen). Now suppose a more general case where bv_samples represents a data frame containing a column for the individual (id) identification and S columns for the samples of breeding values, as shown in Fig. 1. To prepare the input data for AlphaPart, we can merge the data frames called data and bv_samples into a new data frame called newData (Fig. 1). We can then use the function Alp-haPart() to calculate breeding value partitions with the difference that we now should pass the names of the samples to the argument colBV (Fig. 1). Afterwards, the summary() function can be called to summarise the partitions using an explanatory variable, such as generation (Gen). Since we work with posterior samples of breeding values, we obtain posterior samples for the summaries of the partitions (see the accompanying code). Finally, in the case with more than one trait, we suggest a for loop (possibly parallelised) to create one output per trait, as shown in Fig. 1. In an extreme case with more traits than samples, an alternative approach would be to save one sample of breeding values for multiple traits in one data frame and loop over the samples.

Simulation
To evaluate the method and AlphaPart implementation, we simulated a simple cattle breeding programme over 40 generations with 1000 individuals per generation. The first 20 years represented a burn-in phase, where we selected the best 5 males (out of 500) as sires based on their phenotype and mated them with all 500 females from the previous generation and all 500 females from the current generation. These matings produced 1000 selection candidates for the next generation. After the burn-in phase, we tested two selection scenarios over a further 20 generations: we selected the 5 best males from 500 male candidates based on (i) their phenotypes ('medium-accuracy' scenario, r = 0.3 ) or (ii) true breeding values ('high-accuracy' scenario, r = 1 ), as shown in Fig. 2. We replicated the simulation 30 times with the same founding genomes.
The simulation was done with the AlphaSimR R package version 1.0 [22]. We simulated a cattle genome from the coalescent model with recombination and Holstein demography [23]. The genome had 30 chromosomes and 30,000 quantitative trait loci (QTL). The QTL were randomly sampled from segregating sites and had an additive effect sampled from a normal distribution for a single-trait phenotype with a heritability of 0.3. The above-described breeding programme has a low effective population size (ignoring that we use females for two generations N e ∼ (4 × nSires × nDams)/(nSires + nDams) = (4 × 5 × 1000)/(1005) < 20 ) because our aim was to generate an intense selection situation that would show changes in genetic mean and variance. We split the gene-flow matrix T by specifying male and female paths P m + P f = I . Furthermore, we split the male path into selected and non-selected path P s m + P n m = P m , where P m is a diagonal matrix with 1s in rows for males and zeros otherwise; P f = I − P m ; P s m is a diagonal matrix with ones in rows for selected males, and P n m = P m − P s m is a diagonal matrix with 1s in rows for non-selected males. To facilitate interpretation, we scaled the genetic mean and variance of the base population, respectively, to 0 and 1.

Software implementation
We simulated the cattle breeding programme using AlphaSimR R package [22]. We fitted the model in Eq. (6) using the BLUPF90 family of programs [19], while all post-processing was done in R [15]. To compute and summarise the partitions, we used the AlphaPart R package [10], to prepare data and present results, we used the collection of tidyverse R packages [24] and patchwork R package [25]. The simulation and analysis code is fully available at the GitHub repository https://github.com/HighlanderLab/ toliveira_alphapart_variance.

Partitioning of true breeding values
Analysing true breeding values is essential to demonstrate how the partitioning of breeding values and their means and variances works without the uncertainty of estimating breeding values. Figure 3 shows distributions of true breeding values and partitions over generations for the medium-accuracy scenario. While we partitioned true breeding values, simulation was driven by selection with medium or high accuracy. The accuracy impacted true trends in genetic mean and variance, and we analysed these simulation outputs. Figure 3a shows partitions for female and male paths. As expected, the male path contributed the most to genetic gain, almost twice as much as the female path. Even though there was no selection between females (all females contributed progeny for two generations), the contribution of the female path was significantly different from zero. This shows that nevertheless the female path contributed to the genetic gain, as we will analyse further below.
We now turn attention to the summary of the partitions from Fig. 3 with a mean and variance shown in Fig. 4, focusing on the medium-accuracy scenario. Means of partitions followed the centre of distributions shown in Fig. 3. In contrast, partitioning variances indicated a smaller variation for the male path than for the female    Fig. 4a suggest that the variance of both male and female paths are very similar. This observation raises a question: "How can male and female paths contribute similarly to the genetic variance over time if we were selecting only between males?". The answer to this question is shown in Figs. 3b and 4b, where we partitioned breeding values by sex and selection status. Clearly, non-selected males do not contribute to the change in the genetic mean because their Mendelian sampling terms are distributed around zero in their generation and do not contribute to future generations. However, non-selected males still contribute to the genetic variance in their generation, yet this variation is not passed to the next generation. To separate this temporary contribution to genetic variance, we must define path variables by sex and selection status. By doing this, we see that the main source of change in genetic variance are the five selected males, as expected (Fig. 4b).
The higher accuracy scenario expectedly drove more significant changes in genetic mean and variance than the medium accuracy scenario (Fig. 4). This comparison shows that the contribution of paths to genetic variance is a function of selection accuracy, with higher accuracy driving more changes in genetic variance. Notably, with medium accuracy, we saw a smaller difference between partitions of genetic variance for selected and non-selected males. The main reason for this is that the medium accuracy likely did not enable the selection of the top males from the tail of the distribution, which would have had a much smaller variance. We show the full distribution of the partitioned breeding values in Additional file 2: Fig. S3 and Additional file 3: Fig. S4 over 40 generations.
Splitting the male path into selected and non-selected paths also showed that the negative covariance between male and female partitions in Fig. 4a was driven by the covariance between female and selected male partitions (F:M(S), Fig. 4b). This covariance was consistently negative from generation 8 to 20 in the high-accuracy scenario (Fig. 4b), resulting in a mean correlation of −0.33 ( ±0.15 ) for those generations. As a result, the total genetic variance in a generation t can be smaller than the sum of genetic variances for partitions. This nonindependence of partitions of genetic variance is more evident in the high-accuracy scenario from generations 8 to 20, where the correlation decreased even more than in the medium-accuracy scenario (see Additional file 4: Fig. S5). The non-independence of partitions of genetic variance is yet another reason why individual partitions of genetic variance must be interpreted with caution. We return to this point in the discussion.
To further clarify why female partition had a non-zero contribution to the genetic gain, in spite of the absence of selection among females, Fig. 5 shows the histogram of breeding value partitions and Mendelian sampling terms by the path in generation 39 for medium-accuracy (Fig. 5a) and high-accuracy (Fig. 5b) scenarios. We can see that the female partition contributed significantly to genetic gain (red distribution), although less than the selected males' partition (blue distribution), in each group of individuals (females, non-selected males, and selected males). Expectedly, Mendelian sampling terms for females and non-selected males were distributed around zero (gray distribution), while selected males had consistently positive Mendelian sampling terms. However, females were the progeny of previously selected males, and their sons were subject to selection, which created a non-zero contribution for the female partition (red distribution)-through the dissemination of genes selected in their sires and through their (dam's) sons.
The presented results showed one replicate of the simulation. In Additional file 5: Fig. S6, we show the partitioning analysis for each of the 30 replicates that all used identical founding genomes. Our aim was to show that the above results are consistently observed across many replicates but also to show the magnitude of variation between replicates. The solid line represents the median, and the ribbon represents the distribution of true partitions of genetic mean and variance and the correlation between selected male and female partitions.

Model fit
The data were analysed with model (6) using the complete pedigree that enabled accurate estimation of residual and base population genetic variance. However, we slightly overestimated base population genetic variance in the high-accuracy scenario (Table 1). Evaluating the model further in terms of estimating the quantities of interest, we observed that estimates under the medium accuracy scenario for genetic means over generations were better calibrated than for genetic variance over generations ( Table 2). Under the high-accuracy scenario, the genetic mean over generations was also well estimated, but there was considerable miss-calibration for the genetic variance over generations ( Table 2). The estimated and true genetic means and variances over 40 generations are shown in Additional file 6: Fig. S7 and Additional file 7: Fig. S8. One reason for a worse performance of model (6) under the high-accuracy scenario was that it generated significant genetic change both in mean and variance (Fig. 4, which was also manifested by a higher level of inbreeding than the medium-accuracy scenario (see Additional file 8: Fig. S9). As inbreeding increases over generations, it generates variation between individuals that is challenging to represent using only pedigreebased relationships and better approaches are needed, such as genomic relationships.

Genetic means and its partitions
Now that the adequacy of model (6) has been assessed and its impact on the estimates of genetic means and variances over generations has been evaluated, we show the partitioning results when breeding values are estimated from phenotypes. First, we illustrate partitioning results from a single replicate, then extend it by showing results from 30 replicates. Figure 6 shows the true and estimated genetic mean over 40 generations for the medium-and high-accuracy scenarios considering the total genetic mean (Sum), the path for selected males (M(S)), nonselected males (M(NS)), and females (F). For the medium-accuracy scenario, although the point estimate for the mean of selected males partition showed underestimation), the true means of partition of each path was within the 95% credible interval. For the high-accuracy scenario, we observed underestimation for females and selected males partition. Consequently, the underestimation of the total genetic mean was even higher because it is the sum of those two contributions, while non-selected males had a zero contribution. Figure 7 confirms this result by showing the difference between true and estimated means of partition over 30 replicates. Additional file 9: Fig. S10 shows that the observed deviations in both scenarios do not come from inadequately estimated Mendelian sampling terms. Hence, the source of error must be due to the inadequate estimation of the parent average terms.

Genetic variance and its partitions
The partitioning of genetic variance by paths in the medium-and high-accuracy scenarios in a single replicate are shown in Fig. 8. While we correctly estimated the overall trends in the total genetic variance and its partitions, we observed a slight overestimation for the female's and non-selected male's paths and its total in either the medium-or high-accuracy scenarios. However, from generation 1 to 20 in the high-accuracy scenario, the overestimation increased compared to the medium-accuracy scenario. These observations were also confirmed across 30 replicates for both scenarios (Fig. 9). Importantly, distribution over 30 replicates did not include zero in later generations indicating significant differences in the estimates from the true values. Figure 9 also shows an underestimation of genetic variance for the selected male's path in early generations (− 19 to 2), which leads to the underestimation of the total genetic variance in the high-accuracy scenario.  Table 2 Estimate and 95% confidence interval for the concordance correlation coefficient ( ρ c ) between the true and estimated statistic, and point estimates for the Pearson correlation coefficient ( ρ ); bias correction factor ( C b ); and root mean square deviation (RMSD) in each case within scenario  replicates (zero value is denoted with a dashed line and mean difference over replicates is denoted with a solid line and 95% quantile of differences over replicates is denoted with a ribbon) We have initially observed even larger differences but have addressed these by adequately accounting for inbreeding in setting up the A −1 . Ignoring inbreeding significantly impacted the estimates of genetic means and variances and their partitions (see Additional file 10: Fig.  S11, Additional file 11: Fig. S12, Additional file 12: Fig.  S13).

Discussion
We developed a method for partitioning the trends in genetic variance into contributions of different paths as an extension of the previous work with trends in the genetic mean of García-Cortés et al. [1] and Obsteter et al. [18]. The method used to infer the path contributions over generations is illustrated using a single-trait model; however, extension to multiple traits is straightforward and already implemented in AlphaPart. The extension presented here allows researchers to quantify the drivers of genetic variance in their breeding programmes in addition to the drivers of the genetic mean. Consequently, it could help quantify the dynamics between genetic mean and variance in global animal breeding [2,4], how different breeding schemes impact their long-term sustainability [26,27], and how much variability is introgressed in pre-breeding programmes [28,29]. Therefore, it is a powerful and valuable method for retrospective analysis and understanding how different groups of breeding individuals contribute to change in genetic mean and variance, a topic that has been discussed in the last few years [5,7,30]. Moreover, the partitioning analysis can contribute to future decisions in breeding strategies through analysis of past real data or by analysing a combination of real and simulated data to make inferences about future results. For this reason, we implemented this method in the AlphaPart R package. The extension has been available since version 0.9.3, and is freely available from CRAN.
The simulated cattle breeding programme with the medium-and high-accuracy scenarios illustrated the power of the partitioning method to summarise genetic trends in mean and variance. However, some care is needed when using the proposed method. We have shown that the path variable must be considered carefully because a specific choice can lead to a misinterpretation of the contributions, especially regarding the partition of genetic variance. To this end, we recommend plotting the distribution of partitioned breeding values, where partitions can be done with different variables of interest, like sex and selection status, in our study.
By partitioning the genetic mean and variance, we showed that in the high-accuracy scenario, the covariance between contributions of females and selected males plays an important role when partitioning the genetic variance. Consequently, in this case Var(a) < Var(a F ) + Var(a M ) , where F and M represent the female and male paths. Furthermore, most of the (additive) genetic variance in the breeding programme pertained to female and non-selected male paths, which were not the most relevant individuals for disseminating genetic gain, indicating that the selected male path drove changes in genetic mean. While this is an obvious result, it shows the power of the method for more complex cases. A negative correlation between female and selected male partitions in Fig. 4 and Additional file 4: Fig. S5 means that the partitions of genetic variance are not independent. Since the male partition contributes more and more over generations, the female partition has to contribute less, which induces negative covariance between them. In this sense, we demonstrated that variance partitions are not necessarily independent; therefore, they should not be analysed separately.
A negative covariance between breeding value partitions is expected in some cases. We are aware of two cases. The first is when paths represent sexes, as in this study. The second is when paths represent a foreign and a domestic breeding programme. Covariance arises from the proportional relationship between contribution of paths as well as their values, as shown in Additional file 13: Fig. S14 (case A). To illustrate this in the context of sex paths, assume we are mating the best male with a female. In this case, it is expected that the male path will contribute more to the next generation due to the higher intensity (and sometimes accuracy) of selection. Consequently, sires are often the main drivers of genetic change in a population. On the other hand, since the proportion of gene contribution from male and female paths to an offspring must sum to 1, if males contribute more to the value of the next generation, then females must contribute less. This relationship induces negative covariance. The same happens with foreign and domestic paths, assuming that we are importing individuals with high breeding values into a population. Suppose these individuals are well adapted to the environment of the population. In that case, the contribution of the foreign path will increase over time, and the contribution of the domestic path will decrease. This relationship will also induce negative covariance.
A positive covariance is not likely to happen when paths represent sexes in a target population for a reason explained in the previous paragraph. However, it can happen when we import individuals that are not welladapted to the domestic environment. Such individuals contribute negatively to the next generation of offspring (Additional file 13: Fig. S14 case B). In this case, the best genetic material from both domestic and imported paths is expected to contribute more to the next generation, which generates a positive covariance between the two paths that move in tandem in the same direction. Therefore, a positive covariance could be used to alert breeders about the negative impact of introgression since some imported animals are harming domestic genetic gain.
The results showed the overestimation of estimated partitions for genetic variance in the high-accuracy scenario, which originated from the model's lack of fit to the data as quantified by the too-high estimate of the base population genetic variance (Table 1) and low concordance correlation coefficient for estimates of genetic variance over generations ( Table 2). While our example is extreme with a low effective population size, it shows the importance of accurately estimating model parameters in populations under selection [31,32]. Namely, the quality of model parameters estimates impacts the downstream analyses, such as the partitioning of breeding values in this study.
This overestimation of the base population genetic variance and its partitions in the high-accuracy scenario is likely impacted by the lack of information in the pedigree-based model for such an intense selection and low effective population size ( Ne < 20 ) simulated in our study [12]. Namely, we have observed significant changes in the genetic variance of up to 75% over 40 generations. While the pedigree-based model can account for selection [31,32], it does not seem to account appropriately for such a significant change in genetic variance [6,7,12]. Therefore, our next step is to develop an extension of the partitioning method considering genomic data to overcome the issue of working with the expected probability of identity by descent from pedigrees by using the realised identity by descent or state from genomic data [33,34]. We have recently already extended the Sorensen et al. [6] method for temporal estimation of genetic variance with a pedigree-based model to work with genomic data. This extension enables quantifying changes in genetic variance due to changes in allele frequencies caused by drift and selection and changes in linkagedisequilibrium caused by selection (the Bulmer effect). Extending the partitioning method of García-Cortés et al. [1] and current work with such genomic insights is a natural next step.

Conclusions
We developed a method to quantify the drivers of genetic variance in breeding programmes by partitioning the genetic variance by analyst-defined paths. The method developed can provide a comprehensive overview of breeding practises, either based on past results or through simulated scenarios, as shown in this study. Moreover, the covariance between paths can inform the breeder about the dynamics of contributions and can be used to identify potential pitfalls of the breeding programme.
The method can be easily applied to real data by leveraging established software to draw posterior breeding values samples given the observed phenotype data. Working with the posterior sample of breeding values also enables straightforward uncertainty quantification in evaluated partitions and their summaries, mean and variance.
We observed some overestimation of genetic variance and its partitions, but this was caused by the extreme selection in our simulation study and the pedigreebased model, which showed a lack of fit with respect to the observed genetic change in mean and variance. Our future research will extend the proposed method using genomic data to overcome the limitations of the pedigree-based model under such extreme selection settings.